Approximating model probabilities in Bayesian information criterion and decision-theoretic approaches to model selection in phylogenetics.

نویسندگان

  • Jason Evans
  • Jack Sullivan
چکیده

A priori selection of models for use in phylogeny estimation from molecular sequence data is increasingly important as the number and complexity of available models increases. The Bayesian information criterion (BIC) and the derivative decision-theoretic (DT) approaches rely on a conservative approximation to estimate the posterior probability of a given model. Here, we extended the DT method by using reversible jump Markov chain Monte Carlo approaches to directly estimate model probabilities for an extended candidate pool of all 406 special cases of the general time reversible + Γ family. We analyzed 250 diverse data sets in order to evaluate the effectiveness of the BIC approximation for model selection under the BIC and DT approaches. Model choice under DT differed between the BIC approximation and direct estimation methods for 45% of the data sets (113/250), and differing model choice resulted in significantly different sets of trees in the posterior distributions for 26% of the data sets (64/250). The model with the lowest BIC score differed from the model with the highest posterior probability in 30% of the data sets (76/250). When the data indicate a clear model preference, the BIC approximation works well enough to result in the same model selection as with directly estimated model probabilities, but a substantial proportion of biological data sets lack this characteristic, which leads to selection of underparametrized models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Bayesian Optimal Design for Logistic Model

Consider a Bayesian optimal design with many support points which poses the problem of collecting data with a few number of observations at each design point. Under such a scenario the asymptotic property of using Fisher information matrix for approximating the covariance matrix of posterior ML estimators might be doubtful. We suggest to use Bhattcharyya matrix in deriving the information matri...

متن کامل

Model Selection in Phylogenetics

! Abstract Investigation into model selection has a long history in the statistical literature. As model-based approaches begin dominating systematic biology, increased attention has focused on how models should be selected for distance-based, likelihood, and Bayesian phylogenetics. Here, we review issues that render model-based approaches necessary, briefly review nucleotide-based models that ...

متن کامل

Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests.

Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical poi...

متن کامل

Genetic Properties of Some Economic Traits in Isfahan Native Fowl Using Bayesian and REML Methods

The objective of the present study was to estimate heritability values for some performance and egg quality traits of native fowl in Isfahan breeding center using REML and Bayesian approaches. The records were about 51521 and 975 for performance and egg quality traits, respectively. At the first step, variance components were estimated for body weight at hatch (BW0), body weight at 8 weeks of a...

متن کامل

Estimation of genetic parameters of litter size in Moghani sheep using threshold model via Bayesian approach

This study was conducted to estimate the genetic parameters of litter size (LS) in Moghani sheep using threshold model via Bayesian approach. The data originated from the Jafar-Abad Station of Ardabil province, Iran, and included 9698 lactation records of 4977 ewes with lambings from 1995 until 2010. The pedigree file consisted of data on animals born from 1987 to 2010. The significance of fixe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Molecular biology and evolution

دوره 28 1  شماره 

صفحات  -

تاریخ انتشار 2011